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BACKGROUND 

The  Inspector -General  of  the  Marine  Corps  requested  assistance  in  putting  the 
sampling  procedures  of  his  inspectors  on  a sounder  statistical  basis.  His  principal 
requirement  to  achieve  this  is  to  start  with  random  or  nearly-random  samples.  This 
aspect  of  the  problem  is  being  treated  as  a thesis  topic  at  the  Naval  Postgraduate  School. 
This  paper  examines  the  question  of  sample  size  requirements.  Recommended  sample 
sizes  are  derived  for  sampling  with  and  without  replacement.  Two  types  of  rating 
systems  (both  of  which  use  percentages)  are  treated:  one  with  two  ratings,  and  one 
with  more  than  two  ratings.  Actual  systems  have  more  than  two  ratings,  but  practical 
sample  sizes  may  require  treating  these  as  two-level  systems. 

METHOD  OF  ANALYSIS 

We  must  first  establish  what  is  possible,  in  principle,  to  determine  in  advance  of 
an  inspection.  We  assume  that  the  assignment  of  individual  ratings  (physical  fitness  test 
results,  rifle  qualification  scores,  etc.)  are  without  error.  This  means  that  a repeated 
test  would  give  an  unaltered  result,  which  is  a reasonable  assumption  since  the  criteria 
used  are  broad:  pass  or  fail,  qualify  or  fail  to  qualify.  The  situation  is  then  analogous 
to  taking  samples  of  black  and  white  balls  from  a box  containing  a known  number  of  balls, 
but  with  unknown  proportions  of  black  and  white.  In  general,  no  sample  size  less  than 
100  percent  will  guarantee  a sample  that  will  justify  a given  statistical  confidence  in  the 
results.  What  is  possible  is  to  calculate  a sample  size  such  that,  given  a true  propor- 
tion p and  unit  size  N,  the  significance  level  of  the  sample  will  be  better  than  some  level 
a a specified  percentage  of  the  time  . 

For  discussion,  we  take  90  percent  as  a practical  minimal  success  probability,  and 
0.1  as  a practical  significance  level.  Given  that  we  have  correctly  estimated  the  true 
population  proportion,  our  calculated  sample  sizes  will  then  give  ratings  significant  at 
the  0.1  level  or  better  (compared  to  some  standard)  in  90  percent  of  the  cases.  If  the 
true  population  proportion  is  closer  to  the  standard  than  we  assumed,  we  will  be  success' 
ful  less  than  90  percent  of  the  time;  if  farther,  the  frequency  will  be  over  90  percent. 
Since  the  ratings  are  continuous  percentages,  the  actual  unit  performance  could  be  ex- 
tremely close  to  the  standard,  so  that  the  sample  size  tends  to  infinity  for  sampling  with 
replacement  and  to  unit  size  without  replacement.  The  results  of  our  calculations  will 
show  the  tradeoff  between  sample  size  and  the  range  of  unit  performances  likely  to  be 
properly  and  confidently  classified. 
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In  practice,  the  inspection  planner  can  make  a low  estimate  of  unit  performance,  and 
look  up  the  sample  size  required  to  place  that  unit  properly  "most"  of  the  time.  If 
practicable,  a sample  of  that  size  is  taken;  if  not,  the  largest  practicable  sample  is  used. 


-2- 


U.  DETERMINING  SAMPLE  SIZE  REQUIREMENTS  FOR  SPECIFIC  LEVELS 

OF  INSPECTION 

SAMPLE  SIZES  WITH  TWO-LEVEL  RATINGS  (WITH  REPLACEMENT) 

Computationally,  the  simplest  case  is  for  sampling  with  replacement  in  a two- 
level  rating  system.  Actual  rating  systems  will  tend  to  have  more  than  two  levels,  but 
it  will  be  shown  that  practical  sample  sizes  do  not  allow  finer  distinctions  than  pass  or 
fail.  Therefore,  the  two -level  case  tends  to  be  the  interesting  one.  We  assume  that 
there  is  some  criterion  level,  below  which  unit  performance  is  unsatisfactory.  In  our 
examples,  this  percentage  is  84.5. 


The  method  used  is  based  on  the  Student  t -distribution,  with  the  percentage  data 
normalized  by  an  arcsin  transformation.  The  sample  size  S required  is  given  by: 

_ 820.8[tQ  + t2(1_p)]2 

S 5 2 
(Arcsin  - Arcsin  /pT) 

where  a is  the  significance  level  chosen,  P is  the  probability  of  attaining  a level  a, 
p is  the  assumed  true  unit  performance  (a  fraction),  and  p^  is  the  criterion  performance 

level.  The  constant  is  obtained  from  the  variance  of  the  arcsin  function  in  degrees  squared. 
An  "infinite"  number  of  degrees  of  freedom  is  assumed.  The  numerator  is  not  a function 
of  p or  pQ,  and  can  easily  be  tabulated  for  a useful  range  of  P and  a.  (Reference  1 tabulates 

twice  the  numerator,  which  is  appropriate  when  both  percentages  are  measured.  The  last 
two  columns  of  the  table  in  reference  1,  labeled  .01  and  .001,  respectively,  should  be  labeled 
.02  and  .01.) 

Figure  1 and  table  1 show  the  results  of  this  calculation  for  success  probabilities 
ranging  from  80  to  99  percent.  Appendix  A gives  a computer  program  for  calculating 
these  sample  sizes. 

Note  that  the  formula  applies  to  the  difference  between  two  percentages,  either  of 
which  could  be  the  greater  of  the  two.  Except  for  terminology,  the  same  calculations 
serve  to  determine  the  sample  size  required  to  show  that  true  unit  performance  is  below 
some  criterion  level.  A program  for  this  calculation  is  also  given  in  appendix  A. 


See  reference  1,  pages  609-10. 
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TABLE  1 


SAMPLE  SIZES  REQUIRED  WITH  REPLACEMENT  TO  SHOW 
THAT  A UNIT  RATING  IS  ABOVE  A CRITERION  LEVEL3 


Success  probability  (percent) 


True 

unit 

rating 

8£ 

0.85 

319A7 

0 . 86 

3454 

0.87 

1207 

0.88 

596 

0.89 

349 

0.9 

A-  A*..  U 

0.91 

155 

0.92 

1 1 1 

0.93 

82 

0.94 

63 

0.95 

48 

0 , 96 

37 

0.97 

29 

0.98 

22 

0.99 

17 

1 

9 

8_5 

90_ 

37292 

44278 

4029 

4784 

1408 

1672 

696 

826 

407 

483 

262 

312 

180 

214 

130 

154 

96 

114 

73 

87 

56 

67 

44 

52 

34 

40 

26 

31 

19 

23 

11 

13 

95 

99^ 

55959 

81538 

6046 

8810 

2113 

3078 

1044 

1521 

610 

889 

394 

574 

271 

395 

195 

284 

144 

210 

110 

160 

84 

123 

65 

95 

51 

74 

39 

57 

29 

43 

17 

24 

aCriterion  level  = 84.5  percent 
Significance  level  = 0.1 


SAMPLE  SIZES  WITH  THREE-LEVEL  RATINGS  (WITH  REPLACEMENT) 

With  three  or  more  levels,  it  may  be  desired  to  determine  that  a unit's  true  per- 
formance is  within  a given  rating  band,  meaning  that  it  is  simultaneously  above  a lower 
criterion  level  and  below  an  upper  level.  Consider  the  distribution  of  sample  ratings 
for  some  true  unit  performance.  For  a given  sample  size,  some  set  of  the  possible  sample 
ratings  will  justify  the  statement  that  at  some  level  of  significance  the  true  rating  is  above 

the  criterion  level.  This  set  of  ratings  will  constitute  a fraction  F , of  the  possible 

b above 

sample  outcomes,  and  will  cluster  as  far  as  possible  from  the  lower  criterion  rating.  A 
similar  statement  can  be  made  with  respect  to  the  upper  criterion  level,  where  the  frac- 
tion will  be  called  F,  , . The  area  of  the  sample  distribution  for  which  the  statement 

below 

can  be  made  about  both  criterion  levels  simultaneously  is  equal  to  the  area  of  either  F, 

less  the  area  of  that  F not  occupied  by  the  other  F (figure  2).  We  call  this  area  F.  , . 

inside 


r 


i 


j 


For  example,  to  obtain  a 90-percent  chance  of  a successful  sample,  we  need 

F.  = 0.9.  The  sum  of  the  other  Fs  is  then  1.9. 
inside 

Available  tables  permit  calculation  of  probabilities  as  high  as  99.5  percent.  These 
probabilities  and  the  following  procedure  were  used  to  determine  the  sample  sizes  for 
which  f inside  has  some  specified  value,  say  0.9.  First,  for  some  assumed  unit  rating, 

calculate  the  sample  sizes  required  for  a range  of  success  probabilities,  with  respect 
to  one  of  the  two  criterion  levels.  Using  these  sample  sizes,  calculate  the  t-values 
associated  with  a comparison  of  the  unit  rating  with  the  other  criterion  level;  that  is, 
calculate  t2^_p^.  From  a fifth-degree  polynomial  fit  of  P versus  t^  J p^,  calculate 

the  success  probabilities  with  respect  to  the  second  criterion  level.  (All  Ps  greater  than 
.995  are  taken  equal  to  one.)  The  sum  of  the  two  success  probabilities,  less  one,  is 
the  success  probability  for  a sample  being  inside  the  rating  band.  The  range  of  F 

above 

say,  gives  a range  of  F^^^  and,  therefore,  a range  of  Finside*  The  last  are  logarith- 
mically interpolated  to  obtain  the  sample  sizes  required  for  various  selected  success 
probabilities.  Some  of  these  sample  sizes  are  shown  in  figure  3 and  table  2.  Appendix  A 
gives  a program  for  calculating  these  probabilities  and  sample  sizes. 

SAMPLING  WITH  REPLACEMENT 

With  replacement,  the  required  sample  size  can  easily  exceed  the  unit  size.  In  these 
cases,  we  could  simply  sample  the  entire  unit.  However,  with  our  assumptions  about 
the  repeatability  of  test  scores,  it  is  not  necessarily  true  that  the  entire  unit  must  fre- 
quently be  measured.  The  distribution  of  the  number  of  different  individuals  who  would 
be  sampled  in  a given  case  (with  given  unit  and  sample  sizes)  can  be  calculated.  If  the 
mean  of  this  distribution  is  appreciably  less  than  the  unit  size,  an  appropriate  sample 
can  be  obtained  by  counting  randomly  selected  individuals  more  than  once.  As  an 
example  of  the  typical  sample  size  savings  that  can  be  obtained  in  this  way,  we  will 
calculate  the  most  likely  number  of  different  individuals  in  a sample.  Feller  (reference  2, 
page  92)  shows  how  to  calculate  exactly  the  distribution  of  the  number  of  different  indivi- 
duals. The  formula,  however,  requires  double  precision  calculations  on  the  Burroughs 
B6700  computer  to  handle  sample  and  unit  sizes  much  over  30.  Even  double  precision 
calculations  will  not  permit  the  range  of  unit  sizes  required.  Fortunately,  a drastic 
simplification  is  available.  First,  the  large-number  limit  of  the  exact  formula  is  the 
Poisson  distrubution  (reference  2,  pages  93-94).  Second,  for  our  purpose  of  finding  the 
peak  of  the  distribution,  the  exact  and  the  Poisson  distrubutions  coincide  above  unit  and 
sample  sizes  of  15  (determined  empirically).  Finally,  the  parameter  of  the  Poisson 
distribution  gives  (almost  exactly)  the  most  likely  number  of  individuals  not  in  the  sample, 
so  that  no  Poisson  terms  need  be  calculated.  The  parameter  is 

X - N EXPC-R/N], 
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TABLE  2 

SAMPLE  SIZES  REQUIRED  WITH  REPLACEMENT 
TO  SHOW  THAT  A UNIT  RATING  IS  IN  A RATING  LEVEL3 


True 

Success  probability 

(percent) 

unit 

Rating 

80 

85 

90 

95 

99 

85 

32520 

38425 

45358 

58045 

92099 

86 

3514 

4151 

4900 

6272 

9951 

87 

1228 

1451 

1712 

2191 

3477 

88 

606 

717 

846 

1083 

1718 

89 

355 

419 

495 

633 

1005 

90 

229 

271 

319 

409 

648 

91 

159 

187 

219 

281 

446 

92 

132 

147 

169 

206 

320 

93 

127 

140 

161 

193 

281 

94 

147 

170 

197 

252 

400 

95 

225 

266 

313 

402 

637 

96 

445 

526 

621 

794 

1260 

97 

1518 

1794 

2118 

2710 

4299 

aLower 

criterion  ! 

Level  = 84. 

i 5 percent 

Upper  criterion  level  = 98  percent 
Significance  level  = 0.1 
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where  N is  the  unit  size,  and  R is  the  sample  size  (without  replacement).  Forward 
differences  suggest  that  X is  exactly  the  most  likely  number  of  missing  individuals; 
backward  differences  suggest  that  X is  one  unit  too  large,  and  empirical  calculations 
show  that  X is  either  correct  or  one  unit  too  small.  In  practice,  since  the  distributions 
tend  to  be  very  broad,  we  can  take  the  required  sample  size  (number  of  different  in- 
dividuals) to  be  N-X. 

Some  exact  answers  are  given  in  table  3.  The  bottom  line  shows  the  requirements 
for  units  of  infinite  size.  The  other  entries  show  the  numbers  of  different  individuals  in  a 
typical  sample  of  the  size  shown  at  the  bottom  of  the  table,  taken  without  replacement  from 
a unit  of  the  size  given  at  the  left.  These  entries  are  limited  in  size  by  both  the  unit  size 
and  sample  size  for  an  infinite  unit.  The  sample  size  savings  implied  by  these  entries  are 
shown  in  table  4.  Table  4 indicates  that,  while  there  are  combinations  of  unit  size  and  true 
ratings  for  which  weighting  the  sample  can  produce  significant  savings  in  sample  size,  the 
savings  are  usually  too  small  to  justify  the  complexities . 

SAMPLE  SIZES  WITH  TWO-LEVEL  RATINGS  (WITHOUT  REPLACEMENT) 

The  traditional  way  to  take  inspection  samples  is  without  replacement.  Conceptually, 
this  method  is  also  the  simplest:  No  one  is  sampled  or  counted  more  than  once.  In  this 
case,  the  sampling  distribution  is  given  by  the  hypergeometric  distribution  (reference  3, 
page  193);  when  the  population  is  much  larger  than  the  sample,  the  binomial  distribution 
is  adequate. 

The  hypergeometric  distribution  has  four  parameters  and  requires  computation  of  many 
factorials.  The  terms  of  the  distribution  are,  therefore,  not  very  convenient  for  calculation 
or  tabulation.  Nevertheless,  a computer  makes  it  feasible  to  use  the  hypergeometric 
distribution  for  this  problem.  The  method  is  somewhat  complicated. 

There  are  two  classes  of  hypergeometric  distributions  involved.  We  first  assume  a 
unit  (population)  size,  a desired  significance  level,  and  a desired  success  probability. 
Suppose  the  latter  two  to  be  0.1  and  90  percent,  respectively.  We  also  require  a criterion 
level,  which  is  taken  to  be  (as  usual)  84.5  percent. 

Given  the  criterion  level  and  the  unit  size,  we  can  calculate  the  cumulative  distributions 
for  a set  of  sample  sizes.  When  these  are  tabulated,  there  will  be  a point  beyond  which 
the  given  performance  criterion  level  would  produce  sample  results  for,  at  most,  ten 
percent  of  the  time.  That  is,  the  numbers  of  successes  at  that  point  would  be  exceeded 
in,  at  most,  ten  percent  of  the  samples  of  the  indicated  sizes.  The  limiting  number  of 
failures  could  be  graphed  versus  sample  size,  or  divided  by  sample  size  and  graphed  as 
failure  rate  versus  sample  size. 
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TABLE  3 


J 
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TYPICAL  SAMPLE  SIZES  REQUIRED  FOR 
WEIGHTED  SAMPLING  (LOWER  CRITERION  = 84.  5 percent) 


True 

unit 

ratinq 

( percent ) 

Unit 

size 

85 

86 

8_8 

90 

95 

100 

100 

100 

100 

100 

95 

48 

12 

200 

200 

200 

196 

157 

56 

12 

300 

300 

300 

280 

193 

60 

12 

500 

500 

500 

404 

232 

62 

12 

1000 

1000 

991 

562 

268 

64 

12 

2000 

2000 

1817 

676 

288 

65 

12 

3000 

3000 

2391 

722 

296 

66 

12 

4000 

4000 

2790 

746 

300 

66 

12 

5000 

5000 

3079 

761 

302 

66 

12 

6000 

5996 

3296 

771 

304 

66 

12 

7000 

6987 

3465 

779 

305 

66 

12 

Without 

replace- 

ment 

44278 

4784 

826 

312 

67 

13 
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TABLE  4 
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TYPICAL  SAVINGS  IN  SAMPLE  SIZES  BY  USING 
WEIGHTED  SAMPLING  (LOWER  CRITERION  =84.5  percent) 


Unit 

site 

True 

unit 

rating  (percent) 

8_5 

86 

88 

90 

95 

1 00 

100 

0 

0 

0 

5 

19 

1 

200 

0 

0 

4 

4 3 

11 

1 

300 

0 

0 

20 

107 

7 

1 

500 

0 

0 

96 

80 

5 

1 

1000 

0 

9 

264 

44 

3 

1 

2000 

0 

183 

150 

24 

0 

1 

3000 

0 

609 

104 

16 

1 

1 

4000 

0 

1210 

80 

12 

1 

1 

5000 

0 

1705 

65 

10 

1 

I 

6000 

4 

1488 

55 

8 

1 

1 

7000 

13 

1319 

47 

7 

1 

1 
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We  now  know  the  results  that  give  satisfactory  significance  levels  with  given  size 
samples,  but  not  the  probability  of  obtaining  those  results.  To  determine  this  probability, 
we  repeat  the  previous  calculation,  using  a postulated  true  unit  rating  in  place  of  the 
criterion  level.  The  tabulation  now  shows  success  probabilities  when  the  true  unit  rating 
has  the  assumed  value.  There  will  be  a region  in  this  table  such  that  90  percent  of  the 
samples  will  be  found  in  the  region,  for  a given  sample  size.  Again  the  low  failure  rate 
end  is  selected.  This  table  produces  another  graph  of  failure  rate  versus  sample  size, 
which  is  to  be  superimposed  on  the  first  graph.  The  intersection  point  represents 
failure  probabilities  that,  with  the  indicated  sample  size,  will  be  bettered  90  percent  of 
the  time  (given  our  assumed  true  unit  rating)  and  would  occur,  at  most,  ten  percent  of 
the  time  if  the  true  unit  rating  were  really  the  criterion  value.  Repeating  this  calculation 
produces  the  required  sample  sizes  without  replacement  as  a function  of  unit  size,  true 
unit  performance  rating,  significance  level,  and  success  probability.  Table  5 shows  the 
results. 

In  practice,  the  calculation  described  above  would  be  very  tedious  for  large 
sample  sizes,  even  though  a computer  program,  described  in  appendix  A,  has  been 
written  to  calculate  the  terms  and  make  the  comparisons.  For  large  sample  sizes,  a 
drastic  simplification  has  been  found.  The  normal  approximation  to  the  hypergeometric 
distribution  for  large  unit  sizes  is  given  by  (reference  3,  page  247): 

H = rp, 

0 - rp(l-p)  , 

where  p is  the  mean  number  of  passes,  r is  the  sample  size,  p is  the  pass  probability, 
and  n is  the  unit  size.  The  probability  that  the  number  of  passes  will  fall  in  the  range 
from  a to  b is  given  by: 


Pr(a^X^b)  = S ^ 


a+1/2- 


where  § is  the  cumulative  normal  distribution.  For  the  upper  10  percent,  for  example, 


a+1/2- 


= 0.1, 


~ 1.28  . 

a 


To  deal  with  the  distribution  of  results  when  the  population  mean  is  the  criterion 
value,  P=Pcr  (where  p£  is  the  criterion  percentage)  and  a=PQr  (where  pQ  is  the  percen- 
tage that  defines  the  lower  boundary  of  the  ten-percent  tail).  Therefore, 


r<Vpc)=  1,28  V 

where  a is  the  standard  deviation  when  p is  the  mean  success  rate.  Similarly,  the 
c c 

lower  ten-percent  tail  of  the  true  rating  sample  distribution  is  given  by: 
r(p0-pT>  = _1-28  aT  • 

Since  0 is  a function  of  r,  p,  and  N,  the  last  two  equations  involve  the  known  terms 
p , pT>  and  N,  and  the  unknown  terms  r and  pQ.  They  are  easily  solved  for  the  variable 

of  interest,  the  sample  size  r.  (If  either  the  success  probability  or  the  desired  confi- 
dence level  is  changed,  the  constant  1.28  is  changed,  and  the  formulas  are  slightly 
altered.  The  program  has  this  flexibility.)  Results  of  this  calculation  are  shown  in 
tables  6 through  9. 

Comparing  the  columns  of  table  5 with  the  appropriate  columns  of  tables  6 through 
9 shows  that  the  percentage  differences  are  quite  small  for  true  unit  ratings  of  90  and 
below.  Fortunately,  this  is  the  very  region  in  which  the  exact  calculations  are  tedious. 
Above  these  ratings,  the  sample  sizes  are  not  only  easier  to  calculate,  but  also  tend  to 
become  constant  for  unit  sizes  in  our  range.  For  unit  ratings  above  90,  we,  therefore, 
have  easily  replaced  the  approximate  results  of  tables  6 through  9 with  the  exact  results. 
Tables  10  through  13  give  the  results  of  this  hybrid  procedure. 
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SAMPLE  SIZE  REQUIREMENTS:  NORMAL  APPROXIMATION,  UNIT  SIZE  OF  100(100)1,000 
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SAMPLE  SIZE  REQUIREMENT:  NORMAL  APPROXIMATION,  UNIT  SIZE  OF  11,000(1,000)20,000 
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TABLE  11 
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Criterion  value  = 84.5  percent 
Success  probability  = 90  percent 
Significance  level  = 0.1 


SAMPLE  SIZE  REQUIREMENTS:  UNIT  SIZES  OF  1,000(1,000)10,000 
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Criterion  value  = 84.5  percent 
Success  probability  = 90  percent 
Significance  level  = 0.1 
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Criterion  value  = 84. 5 percent 
Success  probability  = 90  percent 
Significance  level  = 0.1 
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APPENDIX  A 
COMPUTER  PROGRAMS 

This  appendix  presents  and  discusses  the  computer  programs  for  the  following 
calculations: 

• Sample  sizes  required  for  exceeding  a criterion  level  (with  replacement); 

• Sample  sizes  required  for  staying  below  a criterion  level  (with  replacement); 

• Sample  sizes  required  for  staying  inside  a pair  of  criterion  levels  (with 
replacement); 

• Sample  size  requirements  from  the  hypergeometric  distribution; 

• Sample  size  requirements  from  the  normal  approximation  to  the  hyper- 
geometric distribution; 

• Printing  the  final  size  requirement  matrixes. 

All  programs  are  written  in  APL/700  for  a Burroughs  B6000/B7000  computer. 

Two-Level  Calculation  (With  Replacement) 

Table  A-l  shows  the  program  ABOVE,  which  calculates  the  sample  size  required  for 
a given  probability  P that  a sample  rating  will  be  above  a given  criterion  level,  at  some 
given  significance  level  ct  , given  the  true  unit  rating.  The  success  probabilities  in- 
cluded are  .5,  .8,  .85,  .9,  .95,  and  .99.  The  significance  levels  are  .1,  .05,  .01, 
and  .001  . The  criterion  level  and  maximum  unit  rating  are  input  as  percentages.  A 
sample  output  is  also  shown  in  table  A-l. 

The  rationale  is  from  reference  1 (pages  607-610),  where  it  is  said  to  be  inappro- 
priate for  "very  small"  sample  sizes.  The  sample  size  estimate  is  based  on  a t-test 
for  the  equality  of  two  percentages,  where  t is  given  by: 


Arcsin 


Arcsin 


yozv.o  , T _t_\ 

X N2 

Here  p is  a fraction,  N.  is  a sample  size,  and  the  angles  are  in  degrees, 
i i 

N2  is  set  "equal"  to  infinity  to  represent  the  criterion  value.  The  t-value  for  the 

assumed  probability  of  success  and  the  value  for  the  assumed  significance  level  are  then 
averaged  to  obtain  an  effective  t (reference  4,  page  349).  The  minimum  sample  size 
requirement  is  then  given  by: 
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TABLE  A- 1 

PROGRAM  AND  SAMPLE  OUTPUT: 
CALCULATION  OF  SAMPLE  SIZE  REQUIRED 
(WITH  REPLACEMENT)  TO  OBTAIN  RESULTS 
ABOVE  A CRITERION  LEVEL 


vABOVECOI* 
v ABOVE 

I*-(PROB=O.5)+<2xpROB=0.8)+<3xpROB=O.85)+(4xpR0B=O.9)+< 
5xpR0B=0 . 95 ) +6xpR0B=0 . 99 

J<-< A=0. 1 >+<2xA=0.05)+(3xA=0.01 >+4xA=0.001 
'PROBABILITY  OF  SUCCESS?  '54  2tPROB 
'SIGNIFICANCE  LEVEL?  '!5  3fA 
P«-<  (P1-LP2)  »2>r0 
P C r 1 3 <-  ( ( L F 2 ) + l F'  1 - L F'2 ) x 0 . 0 1 
PC»23«-<  <F'1-LP2)pP2>x0.01 

N«-0.5x<C0NSTCIS  J3-r<  180*01  )*2>-r (ClopC  SI 3*0.5 >-~loPCr 23 
*0 . 5) *2 
NN«-<  < pN)  >2)fO 
3 NNC  f 1 3*PC  5 1 3 
3 NNC  5 23* LN  + 0 . 5 

3 ' LOWER  CUTOFF  PERCENTAGE?  'S4  1tP2 


ACTUAL  SAMPLE' 
UNIT  SIZE' 
RATING  REQUIRED’ 


PROBABILITY  OF  SUCCESS?  0.95 
SIGNIFICANCE  LEVEL?  0.050 
LOWER  CUTOFF  PERCENTAGE?  84.5 

ACTUAL  'SAMPLE 
UNIT  SIZE 
RATING  REQUIRED 


67186 

7259 

2537 

1254 

733 

473 

325 

234 

173 

132 

101 

79 

61 

47 

35 

20 


A-2 


r 


N = 


2x820,8  tt„  + t20_p)}- 


lArcsinyT^  - Arcsin  }' 


The  numerator  is  not  a function  of  the  ratings  p.;  it  has  been  tabulated  for  a range  of  a 

and  P.  Table  A -2  shows  most  of  the  values  derivable  from  standard  tables  of  the  t- 
distribution. 

« 

(Most  of  the  data  in  table  A -2  comes  from  page  609  of  reference  1,  where  the  last  two 
columns  are  mislabeled.) 

| TABLE  A -2 

COEFFICIENTS  FOR  THE  SAMPLE 
SIZE  CALCULATIONS 

PROP.  CONFIDENCE  LEVEL 

OF 


SUCCESS 

0.1 

0.05 

0.01 

0.001 

0,5 

4442.2 

6306.4 

10891.5 

17780.0 

0.8 

10150.2 

12884.8 

19171.6 

28036.0 

0.85 

11841.0 

14781.0 

21473.0 

30802.0 

0.9 

14059.3 

17249.8 

24426,2 

34324.0 

0.95 

17768.0 

21333.0 

29247.0 

39994.0 

0.99 

25890.0 

30161.4 

39450.1 

51794.0 

Table  A -3  shows  the  program  and  output  when  the  requirement  is  to  show  that  a 
rating  is  below  a criterion  level.  In  both  cases  (ABOVE  and  BELOW),  the  parameters 
to  be  set  are  the  success  probability  desired  (PROB,  from  the  list  given  above),  the 
confidence  level  (A,  also  from  the  above  list),  and  the  cutoff  percentages  (P2  if  lower 
and  PI  if  upper).  The  matrix  CONST,  given  as  table  A -2,  is  also  required. 

Three -Level  Calculations  (With  Replacement) 

When  there  are  three  or  more  levels,  units  can  be  confidently  placed  in  intermediate 
levels  by  establishing  that  the  true  unit  rating  is  likely  to  be  simultaneously  above  a lower 
cutoff  and  below  an  upper  cutoff.  Table  A -4  gives  a program  for  this  calculation,  and  table 
A -5  shows  part  of  a typical  output.  The  rationale  is  described  in  the  main  body  of  this 
report.  Some  program  details  bear  discussion  here. 

A -3 


TABLE  A- 3 

PROGRAM  AND  SAMPLE  OUTPUT:  CALCULATION 
OF  SAMPLE  SIZE  REQUIRED  (WITH  REPLACEMENT) 
TO  OBTAIN  RESULTS  BELOW  A CRITERION  LEVEL 


L9D 

cion 

III  ID 
C12D 
t:i3D 
i::i4D 
t:i5D 
c:  1 6 D 

i:  1 7 d 
i:  1 8 d 


vBELOUCGD? 

■>  BELOW 

IMPR0B=0.5>  + <2xpR0B=0.8)  + <3xpR0B=0.85>  + <4xpRaB=0.9)  + < 
5xpR0B=0 . 95 ) +6xPRQB=0 . 99 

JMA=0. 1 ) + < 2x  A=0  • 05 ) + ( 3x  A=0 . 01  )+4xA=0.001 

'PROBABILITY  OP  SUCCESS:  'i4  2fPR0B 

'SIGNIFICANCE  LEVEL  * '»5  3? A 

P<-(  < <rPl)-P2+l ) »2>p0 

PC  ? 1D<-(F'2+1  ( TPl  )-F'2+l ) xo.01 

PC? 2D<-(  < (TPl  >-P2+l  )pP1  ) xo.01 

N+-0.5X  (CONSTCI  i JD+<  180+01  )*2>-f<  (“lopC  i lD*0.5>-“loF'C  ? 2D 

*0  < 5 ) *2 

NNM  < pN> ,2) pO 

NNC  » 1 D«-PC  » 1 D 

NNC  »2DfLN+0.5 

'UPPER  CUTOFF  PERCENTAGE:  ' ;4  ltPl 

I • 

' ACTUAL  SAMPLE  * 

‘ UNIT  SIZE’ 

' RATING  REQUIRED' 


PROBABILITY  OF  SUCCESS:  0.95 
SIGNIFICANCE  LEVEL:  0.050 
UPPER  CUTGFF  PERCENTAGE:  97.0 

ACTUAL  SAMPLE 
UNIT  SIZE 
RATING  REQUIRED 


0.86 

0.87 

0.88 

0.89 

0.9 

0.91 

0.92 

0.93 

0.94 

0.95 

0.96 


74 

86 

101 

121 

149 

190 

256 

370 

603 

1228 

4368 


TABLE  A- 4 


PROGRAM  FOR  CALCULATION  OF 
SAMPLE  SIZE  REQUIRED 
(WITH  REPLACEMENT)  TO  OBTAIN  RESULTS 
WITHIN  A RATING  LEVEL 


vl  NS  IDE  rCIH  s' 
v INSIDE 

ci 3 ’upper  limit:  1 ;pi 

C2D  ’LOWER  LIMIT:  ’ fP2 

1131  ’SIGNIFICANCE  LEVEL  J 0.1’ 

C 4 ] F’*F'2+ 1 

C5]  AGAINJ  IK-1 

C6ii  m*<pccc;  ildpo 

C 7 ] LOOP:MCII]*CCCII »2]*(  <“lo<Px0.01  )*0.5>~"lo(p2x0.01  )* 
0,5) *2 

C8]  IIfII+1 
C 9]  -*L.OOF‘X  1 1 1 j;  pM 

CIO]  DELTAS  I < ( ~ 1 c < P x 0 , 0 1 ) *0 . 5 ) -~lo  ( PI  xO . 01 ) *0 . 5 ) x 180*01 

C 1 1 ] T*  ( DELTA  x ( M*820 . 8 ) *0 . 5 ) - 1 ,645 

I.  12]  T*  < T x TS 2 . 576 ) +2 . 576 x T > 2,576 

C 1 3 ] J*1 

Cl  4]  F*P<-  ( p M ) p 0 

CIS]  ->SWITCHx  \ +/T  ;:0 . 542 

C 16]  JJ  : F'F'C  J]*  + /0 . 4954  0.41581  “0.01178  “0.084859  0.0297 
“0 . 0030905x Tl"  J ]*0  1 2 3 4 5 
Cl  7]  J*J+1 

C 18]  -> JJx  i J c 1 + pM 

C 1 9 ] -»SET 
C20]  SWITCH:  Ilf- 1 

C.2 1 ] L.00P2:MCII]<CCCII  ?2]-f  ( <“1*<PX0.01  )*0.5)-"lo<Plx0.01  )* 
0.5) *2 

C22]  I I*II+1 
C 23 ] -»L00P2x  \ HipM 

C24.1  DELTA* I < <"lo<px0.01 )*0.5)-“lo(P2x0.01 >*0.5> xl80*ol 
C25]  T*(DELTAx(M*820.8)*0.5)-l .645 

C 26]  T*(TxTj;2.576)+2.576xT>2.576 

C 27]  J*1 

C283  JJ1  JPF’C J]*  + /0.4954  0.41581  "0.01178  “0.084859  0.0297 
“0 . 0030905  x TC  J ]*0  1 2 3 4 5 
C29]  J*J+1 

C 3 0 ] -»JJlxi  J<1  + pM 
C31 ] MM* ( ( pM ) » 4 ) pO 
C32]  MMC  » 2]*CCC  » 1 ] 

C 33]  MMC  * 1 ]*F'F'xT iO  . 542 
C 34 ] MMC  »3]*(MMC i 1 ] + MMC  » 23- 1 )xTi 0.542 
C 35 ] MMC  » 4 ]*L  M+0 . 5 
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TABLE  A- 4 (CONT’D) 


E363  SPRINT 

i:371  SET  1 MM<-  ( ( (>M ) » 4 ) pO 

1:30:1  mme  f i3<-cci: i 13 

C393  M M L » 2 3 <- P P x T £ 0 . 5 4 2. 

1: 40:1  mme  » 3 :i  < hmi:  f 1 :i + mme  ; 2 3- 1 ) x t 0 . 542 

i:4i:i  MME  » 4 3«-LM+0  « 5 

I"  -4  2 ~l  PR  I NT  t l/V)VA/A/lV-V<ViV(V/V(V^(V(y/VlV^^A//VA//V^/V^(VWiViV/VA/(V/Vi*/^<V/VlV(Vl 

1:433  1 ' 

t: 4 4 3 ’TRUE  rating:  ' fP 

C453  ' ' 

[■.463  1 PROBABILITIES  NUMBER  OF' 

E473  * OF  ' 

i:40:i  ' LOWER  UPPER  INSIDE  CASES' 

[.'50  3 N<- 1 

E513  D0J9  4 t MM  E K » 13  5 9 4tMMEK  1 23  ? 9 4?MMEK>33»9  OtMMEK»43 
E523  K <■■  K f .1. 
i:.53:i  -»DOx  vKSpM 
L54  3 ' ' 

ESS 3 PROEM- 00 

i:  5 6 3 0 U T J S ( p M ) + 1 - + / M M C i 3 3 < P R 0 B x 0 . 0 1 
i:57  3 L«-S-l 

[.50  3 NUMBER^  10*  ( 10*MMES>  4 3)  + ( 1 0*MMEL » 4 3+MMES > 43  ) x ( 10*(PR0Bx 
0.01  )+MMESr  33  )*10*MMEL»33+MMESi33 
[.59  3 'NUMBER  OF  CASES  FOR  'JPROB.'  PERCENT  PROBABILITY  IN 
REGION:  ',‘5  0 t L. NUMBER+O  • 5 
E 60  J PROEM  PROD +5 
L61J  ->OU  t x IPR0EK100 
E 623  -) N E W x i p R 0 B .1  0 4 

E 6 3 3 PROEM  99 
E t>4  J >OUT 
E 6 5 3 NEW:P<P+1 
E 66 J ->AGAINX  »P <P1 
E 67  3 -*0 
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It  is  assumed  that  no  probabilities  exceed  99.5  percent.  (In  principle,  this  is 
easily  extended  to  99.9  percent.)  When  any  probability  is  less  than  70  percent 
(t  <0.  542),  the  program  switches  from  calculating  the  sample  size  initially  for  the 
lower  criterion  to  calculating  it  initially  for  the  upper  criterion  (step  15).  This  keeps 
the  composite  probabilities  within  the  range  of  interest.  The  example  output  in  table  A -5 
shows  this  shift.  When  the  composite  probabilities  have  been  calculated  for  each  of 
seven  one-level  probabilities  (ranging  from  .7  to  .995),  logarithmic  interpolation  is 
used  to  obtain  the  sample  size  requirements  for  probabilities  of  80,  85,  90,  and  95  percent. 
The  program  is  written  for  a 0.1  significance  level,  which  is  indicated  in  step  3 and 
determined  by  the  constant  ta  = 1.645  in  steps  11  and  25.  Generalization  to  a set  of 
significance  levels  is  straightforward. 

The  input  to  program  INSIDE  is  nearly  the  same  as  for  ABOVE  and  BELOW,  except 
that  the  constant  is  now  given  by  the  matrix  CC,  which  includes  the  factor  2 and  the 
conversion  from  radians  to  degrees.  This  matrix  is  given  in  table  A -6. 


Sample  Size  Requirements  from  the  H\ 


sometric  Distribution 


Three  nested  programs  were  used  to  calculate  sample  sizes  from  hypergeometric 
distributions.  Table  A -7  shows  the  main  routine  CALC.  Table  A -8  shows  the  subroutines. 
HYPRR  calculates  the  cumulative  hypergeometric  distributions;  STIR  is  a subroutine  of 
HYPRR,  used  to  calculate  the  logarithms  of  binomial  coefficients. 


The  calculation  presents  two  problems:  numbers  that  are  too  large  or  too  small  to 
handle,  and  matrixes  that  are  too  large  for  an  APL  workspace.  Even  when  the  distribution 
terms  are  not  small  enough  to  neglect,  the  binomial  coefficients  required  may  be  out 
of  the  computer  range.  STIR  avoids  this  problem  by  calculating  the  logarithms  of  the 
coefficients.  STIR  is  slightly  complicated  by  the  necessity  to  handle  binomial  coefficients 
that  use  zero  or  negative  factorials.  The  routine  uses  a modified  Stirling’s  formula 
(reference  2,  page  52). 


HYPRR  calculates  distribution  terms  in  two  ways.  First,  it  simply  calculates  the 
terms,  using  STIR,  until  a term  exceeding  some  tolerance  level,  TOL,  is  found.  From 
that  point  on,  the  terms  are  calculated  by  means  of  the  following  recursion  relation 
(reference  3,  page  194): 

f , (K+2-I)  (J+2-I)  H Cl-l.j)  . 

U,JJ  (1-1)  (N+I-K-J) 
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TABLE  A- 5 


PARTIAL  PROGRAM  OUTPUT  FOR  CALCULATION 
OF  SAMPLE  SIZE  REQUIRED  (WITH  REPLACEMENT)  TO 
OBTAIN  RESULTS  WITHIN  A RATING  LEVEL 


TRUE  RATING*.  95.5 


P ROBAB I L.  I TIES  N tJ  M B E R 


LOWER 

UPPER 

INSIDE 

CASES 

- 50 

0.9950 

0.9900 

1 ?2 

0.9900 

0.9950 

0.9850 

108 

0.9750 

0.9915 

0.9665 

89 

0 . 9500 

0.9793 

0.9293 

74 

0.9000 

0.9487 

0.8487 

59 

0.8000 

0.8725 

0.6725 

42 

0 . 7000 

0.7894 

0.4894 

33 

NUMBER  OF 

CASES  FOR 

80  PERCENT 

PROBABILITY 

IN 

region: 

54 

NUMBER  OF 

CASES  FOR 

85  PERCENT 

PRO BAB. Ill TY 

IN 

region: 

59 

NUMBER  OF 

CASES  FOR 

90  PERCENT 

PROBABILITY 

IN 

region: 

68 

NUMBER  OF 

CASES  FOR 

95  PERCENT 

PROBABILITY 

IN 

region: 

82 

TRUE.  RATING:  96  #5 

PROBABI L IT  I ES  NUMBER 



— 

OF 

!.  OWER 

UPPER 

INSIDE 

CASES 

0 . 9950 

0.9950 

0.9900 

126 

0.9950 

0.9900 

0.9850 

111 

0.9938 

0.9750 

0.9688 

92 

0.9838 

0.9500 

0.9338 

76 

0.9575 

0.9000 

0.8575 

60 

0.8876 

0.8000 

0.6876 

44 

0.8080 

0.7000 

0.5080 

34 

NUMBER  OF 

CASES  FOR 

80  PERCENT 

PROBABILITY 

IN 

region: 

54 

NUMBER  OF 

CASES  FOR 

85  PERCENT 

PROBABILITY 

IN 

region: 

59 

NUMBER  OF 

CASES  FOR 

90  PERCENT 

PROBABILITY 

IN 

region: 

69 

NUMBER  OF 

CASES  FOR 

95  PERCENT 

PROBABILITY 

IN 

region: 

83 

W 
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TABLE  A-6 


CONSTANTS  MATRIX  (CC)  FOR  PROGRAM  "INSIDE" 
Probability  P Constant3 


995 

4.454 

99 

3 ''42 

975 

3.249 

95 

2.706 

9 

2.142 

8 

1.546 

7 

1.196 

aConstant  = 2 x (180/tt)^  x 820.8  x 

[tQ  + fc2  ( 1-P ) J , 

where  a = 0.1  . 
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TABLE  A- 7 

ROUTINE  FOR  CALCULATING 
SAMPLE  SIZES  FROM 
HYPERGEOMETRIC  DISTRIBUTION 


vCALCLOlv 
v CALC 

111  K2<-K20 

i:2.1  NEW  J 5 J2<- J.l  FEU 
L3  3 E J 1 xKl,:N 

i::41  SIGMAMEx  (N-Jl ) x ( N--K1 ) * N*2  ) *0 . 5 
LSI  ROtm-ir  + LE+SIGMA 
L63  El«-J2xK2*N 

i: 73  SI GMA 1 < E 1 x ( N~ J2 ) x ( N-K2  ) *N*2 ) *0 . 5 
L83  R0W2+T  ( FE1-SIGMA1 ) LE+X2XSIGMA 
1:93  row  1 r 1 r row  1. 1.  e:  1 "X 1 x s i gma  1 
LI 03  K+Kl 
LI  13  HMK-HYPRR 
L123  ->NE:WK  x i SW“  0 

L 133  P1CK«-  < ( L./HM1  ) SO  . 2 ) A < L /HM 1 ) £ 0 ♦ 00 1 
LI  43  PICK*-  PICK/ l 1+R0W2-R0U1 

LI  5 3 < (7+2XDJS4)  ,0)  r(XX*Jl-l  > xXX<-“2+Jl  f l 2+J2-J1 

L 1 63  ' ' 

L'l  73  ( (7+2XDJS4)  >2)  ? < * < “2+R0W1  ) + \ 1 + R0W2-R0W1  )CPICKf  3 » C23 

HMILPICKi 3 

L 183 
I!  19  3 

L203  NEWK5K<-K2 
L 2 1 3 HM2HHYPRR 
L223  ->SKIPx\SU~0 

L 23  3 PICK*-  ( (L/HM2)SO»2)A(r  /HM2 ) i 0 . 00 1 
I 24  3 PICK<-PICK/\  1+R0W2-R0W1 

C253  ( ( 7+  2XDJS4  ) ,0)r(  XX*  J 1 -1 ) xXX<-~  2+ J1  + ( 2+ J2- J 1 

C 26  3 

C 273  ( ( 7+  2 x II  Js  4 ) » 2 > t < * < "2+ROW 1 ) + l 1 +R0W2- ROW  1 ) L P I CK  t 3 t L 23 

HM2LPICK i 3 


TABLE  A- 7 (CONT'D) 


1 


£28 3 SKIP:'  ' 

£ 29  3 HMlA«-HMl-j;0. 1 

£303  HM2A<-HM2:-;0. 1 

£313  H 1 <- 1 + ( p HM 1 A£  » 13  )-+/HMlA 

£323  H2<-1++/HM2A 

£33  3 H*H1-H2 

£343  HH<-  ( ( 1+J2-J1  ) »2)p0 

£.35  3 HH£  i 1 3<-  < Jl-1 ) + \1  + J2-  J1 

£36  3 HH£  } 23<-H 

£373  'UNIT  SIZE:  ' 5 N 

£383  ' PASSES(CRITERION)  : ' *Klf'<'»5  lflOOxKl-^Ni'  PERCENT)' 

£393  'PASSES (TRUE  RATING):  ' iK2» ' < ' »5  l?100xK2+N» ' PERCENT) 

I 

£40  3 ' ' 

£413  5 Of HH 


£433  K2«-K2-EiP 
£443  -»NEUKx \K2>K1 
£453  Jlf- Jl  + DJ+1 
£463  K2<-K20  N 

£47  3 ' I I I I I I I I I I I I t I I I I I I I I I I I I II  I I I I I II  I I I I I I i I I I I I I I I I I I I 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ' 

£483  -*NEWJx  \ J2SLIMIT-DJ+1 
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r 


! 


i 

TABLE  A-8 

SUBROUTINES  FOR  CALCULATING  SAMPLE 
SIZE  REQUIREMENTS  FROM  THE  HYPERGEOMETRIC  DISTRIBUTION 


vHYPRRCDD v 
v HHH<-  HYF'RR 

Cl  3 ( 1+R0W2-  ROW1  ) > 1+J2-J1  ) pO 

C23  J<-1 

c 3 :i  i <- 1 

C 4 3 COLl  MCOL.11X  \ ( J1+J+1-ROW1  + I ) ^N~K 
C53  HHHC  I 5 J3<-0 
C63  -»II 

[.73  COL11 5 HHHC  I i J3<-*<  ( ROW1  + 1-2  ) STIR  K>  + < ( J1  + J+1-ROU1  + I ) 
STIR  N-K ) - ( J1 + J- 1 ) STIR  N 
C83  -»RECURSx  \ HHHC  I f J3STOL. 

C93  ii:i<-i  + i 

C 10  3 -»COL.  1 x \ ( I j;J1  + J+1-R0W1  )aI  =;1+R0U2-R0W1 
Cl  13  jj:j«-j+i 
Cl  23  I 1 

C 1 3 3 •♦COL.lxx  Jil+J2-Jl 

C 143  -»SUM 

C 153  recurs:  I I f 1 

C16  J ->JJx  U>1+R0U2-R0U1 

[173  HHHC  I i J3<-HHHCI-1  i J3  x ( K+3- I+R0W1 ) x ( J1+J+2-I  + RUW1 ) f ( l + 
RQW1-2)  XN+  U RDU  1-K+J  + J 1 + 1 
C 183  -»ONxiHHHLIJ  J3  iHHHC  1-1 J J3 
C 193  -»JJx  iHHHCI  i J3<1E~6 
C 20  3 0N:->RECURSX  i I<1+R0U2~R0W1 
C 2 1 3 ->JJ 

C223  SUM? -»SUMlx  \K=K2 
C233  HHH  <"■=>+ \*HHH 
C243  ->0 

C253  SUM1  :HHH<-  + \HHH 
C 263  -»0 

<7 


1 


TABLE  A- 8 (CONT'D) 


1 


vSTIRCO 3v 
v l.N<-X  STIR  Z 

111  3 ->EQUALx\Z  = X 

112  3 -»ZER01  x \ Z~0 

113  3 LN1<-F  ACT+(  (Z  + 0.5)  x*Z)  + (r-12xZ)-Z 
i:  A3  -*ZERQ1  + 1 

c:53  zeroi  :l.ni<-*i 

C63  -»ZER02X\X  = 0 

C73  L.N2<-F  ACT+<  < X+0 . 5 ) x*X ) + ( + 12xX  ) - X 

C83  -»ZER02+ 1 

C93  ZER02JLN2<-*1 

C103  LN3f-FACT+(  (0.5+Z-X)x*irZ-X)  + (-fl2xZ-X)  + X-Z 
Cl  13  LN<-L.N1-L.N2+L.N3 

C 123  LN<-LN-30xZ<X 
Cl 33  -»Ox  \ Z*X 
Cl 43  EQUAL.  :L.N«-0 
v 
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where  I is  one  more  than  the  number  of  successes  in  the  sample,  K is  the  number  of 
successes  in  the  population,  J is  the  sample  size,  and  N is  the  population  size.  When 
a term  less  than  some  tolerance  level  (and  also  less  than  the  previous  term)  is  reached, 
the  remaining  terms  are  assumed  to  be  zero. 

HYPRR  and  STIR  thus  allow,  in  a reasonably  efficient  manner,  the  calculation  of  any 
distribution  terms  of  practical  interest;  but,  for  large  units,  the  number  of  terms  exceeds 
the  workspace  capacity.  While  it  is  desirable  to  have  a routine  that  can  calculate  and  sum 
all  terms  (to  verify  directly  that  the  calculated  probabilities  do  sum  to  one),  only  a small 
fraction  of  the  terms  are  required  for  the  principal  purpose. 

Consider  the  case  of  sample  sizes  for  testing  whether  a criterion  has  been  exceeded. 
Only  one  point  on  each  of  two  cumulative  distributions  need  be  determined.  Assuming 
that  the  true  unit  rating  is  the  criterion  level,  we  calculate  the  point  at  which  the  ten- 
percent  upper  tail  starts.  Sample  ratings  above  that  point  have,  at  most,  a ten-percent 
chance  of  being  due  to  true  unit  ratings  at  or  below  the  criterion  level.  We  then  locate 
the  upper  end  of  the  ten-percent  lower  tail  of  the  distribution  with  some  other,  higher, 
true  unit  rating  assumed.  Sample  ratings  above  that  point  will  occur  90  percent  of  the 
time  if  the  true  unit  rating  is  that  assumed.  Initially,  for  small  sample  sizes,  the  two 
tails  will  not  overlap,  and  the  desired  success  probability  and  statistical  confidence  level 
are  not  achievable  simultaneously.  When  the  two  end  points  merge  (or  cross),  the  re- 
quired sample  size  has  been  found.  The  program  is  set  to  indicate  that  point  by  displaying 
the  number  of  successes  at  the  upper  ten-percent  point  of  the  criterion  distribution,  less 
the  number  at  the  lower  ten-percent  point  of  the  true-rating  distribution.  This  number, 
initially  positive,  goes  to  zero  at  the  smallest  acceptable  sample  size  and,  finally,  becomes 
more  and  more  negative.  The  initial  zero  may  be  followed  by  positive  terms  (always  ones), 
representing  tradeoffs  where  a larger  sample  size  produces,  for  example,  a success 
probability  greater  than  90  percent,  but  a confidence  level  slightly  greater  than  the 
desired  ten-percent.  We  define  the  required  sample  size  as  that  producing  the  initial  zero. 

It  will  be  observed  that  all  that  is  required  is  to  identify  the  initial  zero,  which  occurs 
at  ten-percent  points  on  each  distribution.  Because  cumulative  distributions  are  required 
for  comparisons,  it  is  necessary  to  calculate  the  ten -percent  tail  terms  of  each  of  two 
hypergeometric  distributions.  The  tails  can  be  very  long,  however,  consuming  computer 
time  and  space.  We  calculate  the  standard  deviations  of  the  criterion  distribution  (SIGMA) 
and  the  true-rating  distribution  (SIGMA  1),  and  calculate  only  terms  within  X2  SIGMAs  of 
the  true-rating  distribution  and  XI  SIGMAls  of  the  criterion  distribution.  By  occasionally 
monitoring  the  detailed  output  (the  cumulative  distribution  terms)  and  the  input  parameters, 
the  terms  calculated  can  be  held  to  only  a few  more  than  absolutely  required.  The  program 
PAR,  table  A -9,  prints  the  input  parameters  to  assist  in  this  adjustment.  (Initially,  a 
more  elegant  approach,  not  requiring  user  intervention,  was  developed,  but  it  proved  to 
require  a large  fraction  of  the  program  running  time.) 
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TABLE  A-9 


PROGRAM  FOR  PRINTING  INPUT  PARAMETERS 
FOR  CALC, PLUS  SAMPLE  OUTPUT 


vpARLDJv 
v PAR 


C13 

•PARAMETERS' 

C23 

1 1 

L33 

•N=  ' i6  OrNf ' ‘ i 

•Jl  = •} J1 

L 4 3 

• ki=  * ;s  otki f 1 

f ' K20-  ' i K20 

LSI 

‘ D J=  • i 5 OrEiJf  ' 

f— 1 

u 

•mb 

“n 

L63 

•sw=  • ;s  o?su 

C73 

* L. I M I T = ' » LIMIT 

L83 

1 X2=  ' 52  0»X2» ' 

• i ' Xl  = ' i 2 

OfXl 

L93 

' SIGMA  = 1 55  1 ? SIGMA 

i‘  ' » ' SIGMA1  = 

' 55  lfSIGMAl 

L 1 0 3 

' ROW1 = ' >5  OtROUI t • 

1 i'R0W2=  ' 55 

Of R0W2 

v 


PAR 


PARAMETERS 

N=  100 

Jl=  60 

K'l  = 7 A 

K20=  82 

DJ=  9 

DP-  1 

SW=  0 

L.  I M I T = 90 

X2=  10 

X 1 = 10 

SIGMA=  1. 

7 SIGMA1=  1.2 

RDW1 = 43 

R0U2=  60 

The  programs  described  above  work  for  unit  sizes  up  to  at  least  20,000,  using 
piecemeal  calculations.  This  means  testing  five  or  ten  different  sample  sizes  at  a 
given  time.  For  a unit  size  as  small  as  ten,  the  program  could  simply  lie  run  to  cal- 
culate and  print  out  all  terms.  With  larger  unit  sizes,  this  process  leads  to  over- 
lapping lines  in  the  printout,  and,  fairly  quickly,  it  will  also  overflow  the  available 
computer  workspace.  Consider  the  total  output  for  a given  unit  size.  It  involves  a 
matrix  for  the  criterion  value  that  has  a column  for  each  sample  size,  and  a similar 
matrix  for  each  true  unit  rating  considered.  To  determine  the  sample  sizes  for  ten 
different  true  ratings,  we  need  to  identify  the  ten  pairs  of  matrix  elements  representing 
the  10-percent  points  crossovers.  With  a unit  size  of  1,000,  there  are  1,001  elements 
in  a column,  1,000  different  sample  sizes  (rows),  and,  say,  ten  true-rating  matrixes  for 
approximately  eleven  million  elements,  of  which  only  20  are  of  direct  interest.  Many 
of  these  elements  have  already  been  eliminated  by  the  cutoffs  already  described,  but 
these  grow  less  useful  as  more  sample  sizes  are  considered  at  one  time.  Still,  there 
are  many  terms  to  examine  to  determine  a large  sample  size  for  a large  unit,  and, 
even  with  efficient  selection  of  regions  for  examination,  the  process  becomes  tedious. 

The  final  step  that  makes  the  calculations  reasonable  is  to  incorporate  a normal 
approximation  into  the  system  (reference  3,  page  247).  A program  for  this  calculation 
is  given  as  table  A-10,  and  a partial  output  is  shown  as  table  A-ll.  When  these  results 
were  compared  to  the  exact  results,  it  was  found  that,  in  the  regions  where  the  differ- 
ence was  greater  than  about  two  percent,  the  exact  calculations  were  easy  to  do.  In  the 
other  regions,  the  normal  approximation  tended  to  underestimate  slightly  the  required 
sample  sizes. 

The  approximation  produces  a full  set  of  sample  sizes  over  a range  of  unit  sizes  from 
10  to  20,000  (in  reasonable  increments)  in  a trivial  amount  of  computer  time.  These 
answers  can  then  be  used  in  two  ways.  First,  the  easy  calculations  (small  units  or  true 
ratings  well  away  from  the  criterion)  are  done  until  the  exact  sample  sizes  satisfactorily 
match  the  approximate  sample  sizes,  at  which  point  the  remainder  of  the  required  sample 
sizes  for  that  unit  size  are  taken  from  the  approximation.  Secondly,  in  this  process,  the 
search  for  the  exact  size  can  be  started  at  sizes  only  a little  below  the  approximate  size, 
adding  greatly  to  the  efficiency  of  the  search.  Finally,  program  P (table  A-12)  is  used  to 
print  the  corrected  sample  size  results  in  the  format  of  table  A-ll. 
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A USER'S  GUIDE 

This  appendix  presents  some  guidance  for  users  of  the  programs  described  in 
appendix  A.  Appendix  A contains  the  programs  and  general  descriptions  and  should  be 
read  first. 

PROGRAMS  ABOVE  AND  BELOW 

Select  the  success  probability  desired  from  the  following  list:  .5,  .8,  .85,  .9,  .95, 
.99.  (Example:  PROB  -* — .5) 

Select  the  confidence  level  desired  from  the  following  list:  .1,  .05,  ,01,  .001. 
(Example:  A-» — .1) 

Specify  the  criterion  value  as  a percentage.  (Example:  P2  — 84.5.  For 
BELOW,  use  PI.) 

Specify  the  highest  (or  lowest)  true  rating  of  interest  as  a percentage.  (Example: 
P—  100) 

PROGRAM  INSIDE 

Specify  the  upper  criterion  level  (PI)  and  the  lower  criterion  level  (P2)  as  percen- 
tages. 

The  program  is  set  for  a significance  level  of  0.1,  printed  out  by  step  3 and  set  by 
the  constant  1.645  in  steps  11  and  25.  This  is  easily  generalized  in  the  manner  of 
ABOVE  and  BELOW  to  take  significance  level  as  an  input. 

The  results  are  printed  for  a selection  of  result  probabilities.  These  are  based  on 
approximations  valid  over  the  probability  range  fron:  .7  to  .995.  This  range  is  incor- 
porated in  the  assumption  that  all  probabilities  are  taken  to  be  < .995  (t  <2.576),  and  in 
the  coefficients  relating  probabilities  to  t-values  (lines  16  and  28).  If  the  probability 
limits  are  expanded,  the  coefficients  must  be  recalculated,  and,  perhaps,  the  degree 
of  the  polynomial  would  require  increasing. 

Step  55  specifies  the  lowest  probability  for  which  results  are  desired.  This  is  easily 
changed  to  an  input  parameter. 
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Input  parameters  to  be  set  are: 

N — unit  size. 

K1  --  criterion  number  of  successes  in  a unit  of  size  N. 

K20  — the  true-rating  number  of  successes  with  which  the  program  starts. 

DP  — step  decrease  in  number  of  successes,  starting  from  K20. 

J1  — sample  size  with  which  calculations  start. 

DJ  --  1+DJ  is  the  number  of  consecutive  sample  sizes  considered  in  a 
calculation  run. 

SW  — SW^O  selects  a detailed  printout;  SW=0  selects  a minimal  printout. 

LIMIT  --  stops  the  calculation  when  the  sample  size  would  exceed  some  number, 
usually  the  unit  size. 

SIGMA  --  standard  deviation  of  the  distribution  based  on  the  criterion  value. 

XI  — number  of  standard  deviations  over  the  mean  beyond  which  the  criterion 
value  distribution  terms  are  negligible  (determined  by  examination  of 
detailed  output). 

SIGMA  1 — analogous  to  SIGMA,  but  for  distribution  based  on  true  unit  rating. 

X2  — analogous  to  XI,  but  representing  a lower  cutoff  on  calculation  of 
true  -rating  distribution  terms . 

ROW1  --  the  first  row  for  which  at  least  one  term  needs  to  be  calculated.  (Row 
number  less  one  equals  the  number  of  successes.  Row  terms  give 
cumulative  probabilities  of  obtaining  this  number  for  various  sample  sizes. 
For  the  distribution  based  on  criterion  values,  the  upper  tail  is  required, 
and  terms  are  summed  in  reverse  order.) 

ROW2  — the  last  row  for  which  at  least  one  term  requires  calculation. 


The  program  can  be  used  with  and  without  the  guidance  provided  by  the  normal  approxi- 
mation (output  of  program  NORM).  An  inspection  of  tables  10  through  13  in  the  main  text 
shows  that  selecting  the  order  in  which  various  unit  sizes  are  examined  can  avoid  unneces- 
sary calculation  of  sample  sizes  that  do  not  vary  with  unit  size.  Note  also  that  only  in 
small  regions  of  the  table  are  adjacent  sample  sizes  in  a column  within  ten  samples  of 
each  other.  Generally  DJ  should  be  set  to  9,  allowing  examination  of  ten  sample  sizes 
in  one  run;  a run  will  then  yield  only  one  answer.  The  true  rating,  if  any,  for  which 
one  of  the  ten  sample  sizes  is  appropriate  is  found  by  starting  with  the  highest  success 
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number  possible  (the  highest  K20,  considering  sample  size  requirements  already  deter- 
mined), and  stopping  when  an  answer  is  found.  Table  B-l  shows  the  determination  of  a 
sample  size  requirement  of  68  ‘o  establish  that  a unit  of  size  90,  with  a true  rating  of 
90.0  percent,  is  above  a criterion  level  of  84.5  percent.  This  determination  can  be  made 
90  percent  of  the  time  with  at  least  90-percent  statistical  confidence. 


TABLE  B-l 

PORTION  OF  A MINIMAL  CALC  OUTPUT 
SHOWING  A SAMPLE  SIZE  REQUIREMENT  OF  68 


UNIT  SIZE!  90 

P A S S E S < C R I T E R ION) I 7 6 ( 84. 4 p E R C E N T ) 
PASSES (TRUE  RATING) J 81 ( 90  0 PERCENT 


Initially,  XI  and  X2  are  set  to  some  large  value,  say  20.  In  the  detailed  output, 
we  should  find  that  every  column  has  a few  zero  terms,  showing  that  neglected  rows 
are  indeed  negligible.  The  printout,  however,  is  set  to  suppress  rows  outside  the 
area  of  interest;  therefore,  many  rows  may  be  calculated  but  not  displayed.  The  output 
of  PAR  shows  which  rows  are  calculated  (those  from  Rowl  to  Row2,  and  XI  and  X2 
are  adjusted  to  keep  the  calculated  rows  to  only  a few  more  than  the  required  number. 
This  adjustment  need  not  be  made  very  frequently. 
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The  program  is  set  for  a significance  level  of  0.1  and  a success  probability  of 
90  percent,  both  represented  by  10 -percent  tails.  In  principle,  perhaps,  we  could 
simply  print  out  all  of  the  matrixes  and  have  the  potential  of  selecting  sample  sizes  for 
any  combination  of  confidence  level  and  success  probability.  In  fact,  however,  this 
method  is  made  practical  only  by  strategems  that  minimize  nonrelevant  calculations. 

If  another  combination  is  required,  it  should  be  separately  calculated.  The  confidence 
level  is  set  by  the  0.1  in  step  29  of  CALC.  The  0.1  in  the  next  step  sets  the  90-percent 
success  probability.  Steps  13  and  23  limit  the  detailed  printout  to  terms  just  short  of 
the  10-percent  tail  set  by  the  confidence  level;  an  adjustment  would  be  required  for  a 
larger  confidence  level  (such  as  0.2),  and  would  be  desirable  for  a smaller  level. 

PROGRAMS  HYPRR  AND  STIR 

HYPRR  is  called  by  CALC  and,  in  turn,  calls  STIR.  STIR  calculates  the  natural 
logarithms  of  the  binomial  coefficients,  and  is  complicated  somewhat  by  special  cases. 
CALC  uses  STIR  until  a hypergeometric  term  of  at  least  TOL  is  obtained.  Generally 
a TOL  of  about  10-1^  is  reasonable.  A recursion  relation  is  used  to  calculate  further 
terms.  Terms  below  10"6  are  not  calculated  now,  but  assumed  to  be  zero. 

PROGRAM  PAR 

PAR  should  be  used  at  the  start  of  a run  to  determine  that  all  parameters  have 
been  set  correctly.  PAR'S  output  should  be  examined  periodically  to  ensure  that  a 
sufficient,  but  not  excessive,  number  of  rows  is  being  calculated. 

PROGRAM  NORM 

NORM  requires  setting  the  criterion  value  to  be  exceeded  (Example:  PC  0.845) 
and  two  coefficients:  BETA1  and  BETA2.  The  coefficients  are  the  cutoff  value  for  the 
tails  of  a standardized  normal  distribution  for,  respectively,  the  confidence  level  and  the 
success  probability.  For  example,  a confidence  level  of  0.1  and  a success  probability 
of  90  percent  represent  10-percent  tails,  with  cutoffs  of  1.282.  The  printout  requires 

specification  of  the  confidence  level  and  success  probability  (Examples:  CONF— 10 

and  PROB  90).  If  desired,  it  is  straightforward  to  cause  these  to  set  the  coeffici- 

ents, as  was  done  in  programs  described  earlier. 

PROGRAM  P 

The  final  program  (table  A -12)  is  used  to  print  the  correct  sample  size.  The  para- 
meter K is  set  to  1,  2,  3,  or  4 to  get  the  correct  underlining.  The  size  matrix  R is  ob- 
tained from  the  same  matrix  in  NORM  by  replacing,  where  necessary,  the  approximated 
sample  size.  (By  varying  the  initial  value  of  K,  NORM  can  be  used  to  calculate  any  of 
its  four  matrixes  by  itself.) 
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